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Density-functional theory has been one of the most successful approaches ever to address the 
electronic-structure problem; nevertheless, since its implementations are by necessity approximate, 
they can suffer from a number of fundamental qualitative shortcomings, often rooted in the remnant 
electronic self-interaction present in the approximate energy functionals adopted. Functionals that 
strive to correct for such self-interaction errors, such as those obtained by imposing the Perdew- 
Zunger self-interaction correction [Phys. Rev. B 23, 5048 (1981)] or the generalized Koopmans' 
condition [Phys. Rev. B 82, 115121 (2010)], become orbital dependent or orbital-density depen- 
dent, and provide a very promising avenue to go beyond density-functional theory, especially when 
studying electronic, optical and dielectric properties, charge-transfer excitations, and molecular dis- 
sociations. Unlike conventional density functionals, these functionals are not invariant under unitary 
transformations of occupied electronic states, which leave the total charge density intact, and this 
added complexity has greatly inhibited both their development and their practical applicability. 
Here, we first recast the minimization problem for non-unitary invariant energy functionals into the 
language of ensemble density-functional theory [Phys. Rev. Lett. 79, 1337 (1997)], decoupling the 
variational search into an inner loop of unitary transformations that minimize the energy at fixed 
orbital subspace, and an outer-loop evolution of the orbitals in the space orthogonal to the occupied 
manifold. Then, we show that the potential energy surface in the inner loop is far from convex 
parabolic in the early stages of the minimization and hence minimization schemes based on these 
assumptions are unstable, and present an approach to overcome such difficulty. The overall for- 
mulation allows for a stable, robust, and efficient variational minimization of non-unitary-invariant 
functionals, essential to study complex materials and molecules, and to investigate the bulk thermo- 
dynamic limit, where orbitals converge typically to localized Wannier functions. In particular, using 
maximally localized Wannier functions as an initial guess can greatly reduce the computational costs 
needed to reach the energy minimum while not affecting or improving the convergence efficiency. 



I. INTRODUCTION 

Density functional theory (DFT) 0, Q has become 
the basis of much computational materials science today, 
thanks to its predictive accuracy in describing ground- 
state properties directly from first principles. While DFT 
is in principle exact, in any practical implementation it 
requires an educated guess for the exact form of the en- 
ergy functional. For many years, local or semi-local ap- 
proximations to the exchange-correlation energy, such as 
the local density approximation (LDA) 0,01 or the gener- 
alized gradient approximation [5[ have been successfully 
applied to a wealth of different systems [6]. Still, these 
approximations lead to some dramatic failures, including 
the overestimation of dielectric response, incorrect chem- 
ical barriers for reactions involving strongly-localized or- 
bitals @, [H> energies of dissociating molecular species, 
and excitation energies of charge-transfer complexes, to 
name a few Q. 

Key to these failures is the self-interaction error of 
approximate DFT 0, 0, where the electrostatic and 
exchange-correlation contributions to the effective energy 



of the entire charge distribution are not "purified" from 
this spurious self interaction of an individual electron 
with itself. To address this issue, Perdew and Zunger 
(PZ) introduced first an elegant solution to this prob- 
lem, where a self-interaction correction (SIC) is added to 
the total energy calculated from approximate DFT (e.g. 
within the LDA [3l \m) , but practical applications have 
remained scarce [10Hl9l ]. 

An important property of DFT with local or semi-local 
exchange-correlation functionals is the invariance of the 
total energy with respect to unitary transformation of the 
occupied electronic states. However, SIC-DFT does not 
have this invariance property, and in fact finding the op- 
timal unitary transformation given a set of orbital wave- 
functions is crucial to the numerically consistent mini- 
mization of density functionals with SIC [12 , [Uill. In 
this paper, we focus on the variational minimization of 
energy functionals that do not satisfy unitary invariance 
in order to provide a stable, robust, and efficient deter- 
mination of the electronic structure in this challenging 
case. In particular, we adopt the formulation of ensemble 
DFT |26| to decouple the variational minimization into 
an inner loop of unitary transformations and an outer 
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loop of evolution for the occupied manifold, and suggest 
optimal strategies for the dynamics of unitary transfor- 
mations. In the solid-state limit, this dynamics gives rise 
to a localized Wannier representation for the electronic 
states, and we assess their relation with maximally local- 
ized Wannier functions (MLWFs) [27, 28 1 as obtained in 
the absence of SIC. 

The remainder of the paper is organized as follows. In 
Sec. II, DFT with SIC is briefly reviewed, the method 
of inner-loop minimization is explained, and the issue of 
using MLWFs as an initial guess for the wavefunctions 
is discussed. In Sec. Ill, we present and discuss the re- 
sults. First, we present results on how the total energy 
varies with the unitary transformation of the occupied 
electronic states. Second, we discuss the stability and 
efficiency of our method for inner-loop minimization. Fi- 
nally, we show how the calculated total energy converges 
both as a function of the outer-loop iterations and as a 
function of the CPU time and discuss the optimal scheme 
for total energy minimization of energy functionals with 
SIC. We then summarize our findings in Sec. IV. 



II. METHODOLOGY 
A. Background 

For simplicity, we consider in the following the wave- 
functions to be real; however, the discussion can straight- 
forwardly be extended to complex wavefunctions. The 
total energy of the interacting electron system from 
Kohn-Sham DFT within the LDA is given by 
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+ 2 J J '\v-v'\ drdr ' + J 4 UA (p(r))p(r)dr, (1) 

where a is the spin index, the band index i runs 
through the N occupied electronic states, and p(r) = 

J2cr J2i=i \^P<?i{ Y )\ 2 1S the total charge density. The first 
term on the right hand side of Eq. (fTJ) is the kinetic 
energy, the second term the interaction energy between 
electrons and the ion cores, the third term the Hartree 
interaction energy, and the last term the exchange- 
correlation energy. This energy functional -ELDAKVVi}] 
is invariant under the following unitary transformation 
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(2) 



for an arbitrary unitary matrix O a since the total charge 
density p(r) and the kinetic energy [Eq. ([I])] are invariant 
under this transformation. Given that the wavefunctions 



are real, we consider O a to be an orthogonal matrix, 
i. e. , real and satisfying Ojj-Oo- = I where / is the N x N 
identity matrix. 

For some density functionals with SIC 0, , the total 
energy £ to tai[{VVi}] is given by 

£total[{?M] = £LDA[{?M] + E S Ic[{pai}} , (3) 

where p a i(r) = |Vvi(r)| 2 . E S ic[{pai}} and hence 
-EtotaiKVVi}] are in general not invariant under orthogo- 
nal transformations because they are dependent not only 
on the total charge density p(r), which is invariant un- 
der orthogonal or unitary transformation, but also on the 
charge densities, p CT i(r)'s, arising from different orbitals. 

This can be seen by considering how the SIC energy 
varies under the orthogonal transformation of Eq. @. To 
this end, it is useful to recall that an orthogonal matrix 
O a can be written as 



0„ = e A ° 



(4) 



where A a is an antisymmetric matrix; if we further con- 
sider the case where the norm of A„ is much less than 
that of an identity matrix, we can assume 

O a w / + A a . (5) 

Therefore, the transformed wavefunctions are given by 
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and (using the antisymmetry of A a ) 
dpai(r) 



dA, 
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Finally, if we define the SIC potential 

Spai{r) ' 



(6) 
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we obtain the gradient of SIC energy with respect to the 
transformation matrix elements 



dE t 



sic 



dA a 



= 2 / <Mr) [vlf (r) - vlf{v)} ^-(r) dr , (10) 



which is a result originally obtained by Pederson et 
al. [20(. Note that this gradient matrix G a is also an- 
tisymmetric, just like Aa-. Therefore, at an energy mini- 
mum, the wavefunctions satisfy 

= / <Mr) [t$ c (r) - ^ c (r)] ^(r) dr , (11) 
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which was referred to as the "localization condition" by 
Pederson et al. [20j . 

To date, the most widely used SIC scheme is PZ SIC Q 
(and its few refinements, e.g. , Refs. 3, 3(| 31|). In PZ 
scheme, the SIC energy is given by 
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- EE/ 4 DA (P-(r))p CTi (r)rfr. (12) 

o- i=l ^ 

The rationale underlying PZ SIC is both simple and 
beautiful: correcting the total energy by subtracting the 
incorrect energy contributions from the interaction of an 
electron with itself — i. e. , the Hartree, exchange, and 
correlation energies. Hence PZ SIC is exact for one- 
electron systems, or in the limit where the total charge 
density can be decomposed into non-overlapping one- 
electron charge density contributions. 

Recently, an alternative scheme suitable for many- 
electro n sy stems based on the generalized Koopmans con- 
dition 32] was introduced in Ref. 1291. In brief, one could 



start from Janak's theorem [33| that states that in DFT 
the orbital energy e CT i(/) with fractional occupation of a 
state being = / is 



»(/) = 



(13) 
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where E ai is the Kohn-Sham total energy minimized 
under the constraint j a i = f. If there were no self- 
interaction, the orbital energy of a state e CT i(/) would 
not change upon varying its own occupation /. In other 
words, for a self-interaction- free functional, 



£«(/) = constant (0 < / < 1) 



(14) 



Alternatively, using Janak's theorem [33], this can be 
rewritten as 

A £Koop m ans (/) = ^ (/ ^ _ ^ (Q) = ^ 

(0</<l), (15) 

which is equivalent to the generalized Koopmans theo- 
rem [29j], telling us that the total energy varies linearly 
with the fractional occupation In conventional DFT, 
however, Eq. (jT4"]) or Eq. (fT5")) does not hold and instead, 



Eai(fai) — E ai (0) 



e ri {f)df. (16) 



FromEqs. (fT5j) and (fl"6|) . the non-Koopmans (NK) energy 
n<ri(/) - i.e., the deviation from the linearity for the 
energy versus occupation - can be defined as [29J 



U ai (f) = A£^ oopmatls (/) — AE^i 

' " M/) - e CTi (/')] df. 



(17) 



From this result, the SIC energy term based on the gen- 
eralized Koopmans theorem has been defined as 



JV 
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(18) 



where / ro f is a reference occupation factor (for many- 
electron systems, / rc f = \ was shown to be the best 
choice (ii|). 

The total energy versus (fractional) number of elec- 
trons relation calculated by exact DFT should be piece- 
wise linear with slope discontinuities at integral electron 
occupations 3J[; however, within the LDA, this energy 
versus occupation relation is piecewise convex |9|. The 
LDA deviation from the piecewise linearity is the main 
reason for the failures of approximate DFTs [9j. The 
new SIC functional [Eq. (fTSjl ] is introduced to cure this 
pathology and to recover the piecewise linearity of ex- 
act DFT [H. The (bare) NK SIC discussed above and 
its screened version explain some of the most important 
material properties such as ionization energy and elec- 
tron affinity better than PZ SIC. We refer the reader to 
Ref. [H for the details of NK SIC. 



B. Implementation 

In order to implement a variational minimization of 
the total energy functional, we adop t the same strategy 
as the ensemble-DFT approach |26[, decoupling the dy- 
namics of orbital rotations in the occupied subspace and 
that of orbital evolution in the manifold orthogonal to 
the occupied subspace. In explicit terms, we minimize 
the SIC energy through 



min E S ic[{ipU}] = r mi n min Esic[{ip<ri} , 

(19) 

where {V 7 ^} and {i/vi} are connected by an orthogonal 
transformation {O ct } [Eq. ([2])]. Minimization over the ba- 
sis orbital wavefunctions {ipa-i} and that over the orthog- 
onal transformation {Oa-i} - inside the round parenthesis 
in Eq. (|19p - correspond to the outer-loop minimization 
and inner-loop minimization, respectively, i. e. , given the 
orbital wavefunctions, an optimal orthogonal transfor- 
mation is searched and then the orbital wavefunctions 
are evolved. This process is repeated until convergence. 
Ensemble-DFT minimization has also been discussed in 
studying the SIC problem by Stengel and Spaldin 12 1 
and by Klupfel, Kliipfel, and Jonsson |25j |. 

The main focus here is on inner-loop minimization. 
The gradient matrix Gcrij = dEgic/dA^ij in Eq. (flQ|) is 
antisymmetric and real; hence, — i G a is Hermitian (and 
purely imaginary). Therefore, —i G a can be diagonalized 
as 



iG a = Ul D a XJ a , 



(20) 
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G a = iUl D a U a , 



where U a is a unitary matrix and 



(21) 



(22) 



a real diagonal matrix. From Eq. (|2T|) . we evolve the 
matrix A a along the energy gradient with a step of size I 



AA a = -lG a = -ilUlD a U a , 
calculate the updated orthogonal matrix 

„AA„ T-rt -HD 



(23) 



(24) 



and then transform the wavefunctions accordingly. 

Here, we use the steepest-descent method for the inner- 
loop minimization. But one could employ other methods 
such as damped dynamics or conjugate gradients. In each 
of the inner-loop steps, we evaluate the SIC energy with 
two different sets of wavefunctions: first by using the 
given wavefunctions [-EsicO — 0)] and second by using 
the wavefunctions transformed by O a in Eq. (|24l) with 
a trial step I = 'trial [E$ic{l = 'trial)]- In addition, the 
gradient at I — reads 



dE SIC (l) 



dl 



1=0 



^E 



aij 



dEsic dAA a 



dA 
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1=0 



5 Ei". 



(25) 



where we have used Eqs. (jlOl) and (|23l) . and the fact 
that only half of the matrix elements of G a are indepen- 
dent. Thus, knowing .EsicG = 0), Esic(l = Atrial), and 
dEsic(l)/dl\i=o, we can fit a parabola to Esic(l), yield- 
ing the optimal step I = Optimal and the energy minimum 
Esic(l — 'optimal)- This completes one inner-loop itera- 
tion. We then use the transformed wavefunctions to cal- 
culate the gradient [Eq. (ITU1) ] and repeat iterations until 
the SIC energy converges. 

For optimal convergence, we set the step size based on 
the highest frequency component of the gradient matrix, 



l. e. 



l = jl c [lc = 



A,. 



(26) 



where 7 is a constant of order ^0.1 and A max the maxi- 
mum eigenvalue of D a , 

A m ax = max \ ai . (27) 

The critical step l c should be considered as the point 
when the transformed wavefunctions become apprecia- 
bly different from the original wavefunctions. There- 
fore, when we evolve wavefunctions by using a step much 



larger than l c , a fitting of Egic versus I by a parabola 
will not be successful. Imposing the constraint I = 7 l c 
[Eq. (j2"o) ] when necessary is the key part of our method: 

(i) We set the trial step of the first iteration of the inner- 
loop minimization according to Eq. (|26[) . (In subsequent 
iterations, the trial step 'trial is set based on the optimal 
step of the previous iteration: we set it to be twice the 
optimal step of the previous iteration.) By setting the 
initial trial step based on the eigenspectrum of the gra- 
dient matrix, we make the inner-loop process unaffected 
by the absolute magnitude of the SIC energy gradient 
with respect to the orthogonal transformation [Eq. (Tl0|) ] . 

(ii) When the calculated optimal step is larger than jl c , 
we set Optimal = 7'c- This procedure has proven to be 
instrumental when -Esic(') versus I relation cannot be 
fitted well by a parabola. In such cases, the calculated 
'optimal can be much larger than l c . A similar scaling 
method based on the highest frequency component of the 
gradient matrix was used in finding the MLWFs 27 , 3l| . 



C. MLWFs as an initial guess for the wavefunctions 

SIC tends to localize the orbital wavefunctions [note 
e. g. , that the Hartree term in Eq. ([12")) will be more neg- 
ative if the state becomes more localized]. Therefore, it is 
natural to consider using some localized basis functions 
as an initial guess for the wavefunctions of density func- 
tionals with SIC. To this end, employing MLWFs USS] 
represents a very promising initial-guess strategy. Al- 
though the possibility of using MLWFs in this regard was 
recently suggested [36j, no literature is available on the 
merit of that scheme. We address this issue in conjunc- 
tion with the inner-loop minimization method discussed 
in the previous subsection. 



D. Computational details 

We performed DFT calculations with norm-conserving 
pseudopotentials [37j in the LDA i4| using the Car- 
Parrinello (CP) code of the Quantum ESPRESSO distri- 
bution [38| with the inner-loop minimization described 
in the previous subsections, and a conventional damped 
dynamics algorithm for the outer-loop minimization. We 
have performed calculations on both PZ SIC [4] and NK 
SIC [29( . Except for the case of investigating the effect of 
using MLWFs as an initial guess for the wavefunctions, 
we have used LDA wavefunctions with some arbitrary 
phases - they are not LDA eigenstates - when we start 
the calculations. 

We performed calculations on a rather big molecule, 
C20 fullerene. A supercell geometry was used with the 
minimum distance between the carbon atoms in neigh- 
boring supercells larger than 6.7 A. The Coulomb interac- 
tion is truncated to prevent spurious interaction between 
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periodic replicas in different supercells 



III. RESULTS AND DISCUSSION 

In order to find an optimal strategy for the minimiza- 
tion of SIC DFT, it is important to know how the energy 
varies with orthogonal transformations. We first show 
the energy variation along the direction in the orthogonal 
transformation space parallel to the gradient [Eq. ([T0| ] 
of the energy, i. e. , -Esic(0 versus /, where / is a step rep- 
resenting the amount of orthogonal rotation as defined in 
Eq. (|23]h Figure Ufa) shows the results for PZ SIC at a 
few different stages during the inner-loop minimization. 
What we can see is that initially E^^,(l) varies slowly 
with I, and then, in the middle of the inner-loop mini- 
mization, varies fast and then, toward the end of the min- 
imization, varies slowly again. There is no good length 
scale of / which can consistently describe the variation of 
Esic(l) during the entire process of an inner- loop mini- 
mization. The speed of the energy variation at different 
stages of the inner-loop minimization with respect to / 
near / = can however be very well explained by A max 
[Eq. (|27p], which is the fastest frequency component of 
the gradient matrix [Eq. (fit)])], as shown in Fig. [ljb). 

We can draw similar conclusions for NK SIC as shown 
in Figs. [TJc) andQJd). However, there are a few points 
that are worth mentioning. First, the magnitude of NK 
SIC energy is several times smaller than that of PZ SIC 
energy [Figs. QJa) andQJc)]. Second, A max , or the main 
driving force for orthogonal transformation near I = 0, 
for NK SIC is also much smaller than that for PZ SIC, 
although eventually both of them converge to zero at 
energy minima. Because of these differences between dif- 
ferent SIC functionals, it is clear that determining the 
trial step Atrial based on A max will be very useful, even 
more so because A max is also affected by the arbitrary 
initial phases of the wavefunctions, as will be discussed 
later (Fig. [6]). 

Based on the previous discussion, we now show, in 
Fig. [2j i?sic(0 as a function of the scaled step 



scaled 



l/lc 



(28) 



i. e. , / in units of l c . For both PZ SIC and NK SIC, 
the energy variation length scale near / = through the 
entire process of the inner-loop minimization is ~ 0.5 in 
units of /scaled- The results confirm that indeed a natural 
length scale for I that should be used in the inner-loop 
minimization is the / c defined in Eq. (|26|) . One more 
thing to note here is that in both PZ SIC and NK SIC, at 
the initial stages of the inner-loop iterations, the energy 
profile cannot be well fitted by a parabola. This trend is 
dramatic especially for NK SIC, where the Esic(l) versus 
I (or /scaled) relation is concave, not convex, at / = 0. 

This can be best understood using a simple system: 
a carbon atom which has, in our pseudopotential calcu- 



lations, two orbitals (2s and 2p), i.e., it is a two- level 
system. The PZ SIC energy SgicCO versus /scaled is 
shown in Fig. [3] The profile is sinusoidal with a period of 
0.5, rather than parabolic for the entire process of min- 
imization. Notably, the period 0.5 in units of /scaled is 
similar to the previously discussed length scale for C20 
fullerene. The shape of the curve does not change as we 
proceed in the inner-loop minimization; the only varia- 
tion is that the minimum of the curve moves toward the 
origin (Z sca i od = 0). 

We can understand this behavior as follows. The gra- 
dient matrix in Eq. (I10[) for a carbon atom is of the form 



G = 



c 
-c 



'v I 



(29) 



where c is a real constant and a y the Pauli matrix. (We 
dropped the spin index for obvious reasons.) Assuming 
(without losing generality) that c > 0, the maximum 
eigenvalue of G is 



A„ 



(30) 



and the orthogonal transformation matrix [Eqs. (|23[) 
and (j24)) ] is given by 

O = e~ lG = cos (lc) I - i sin (lc) a y , (31) 

or, using Z sca i cd [Eq. ([28])], 

(32) 



O = cos (n / sca i d) I - i sin (n Z sca i c d) & y 



In particular, when / SC aied = 0.5, O = —icr y , and, un- 
der this orthogonal transformation O, ip'i = — V2 and 
ip'2 = ipi, i.e., O just exchanges the two orbital wave- 
functions (plus a trivial sign change). When the original 
wavefunctions ipi and ip2 correspond to the maximum 
SIC energy configuration, the new set of wavefunctions 
ip'i and "02 w iH correspond also to the SIC energy maxi- 
mum. Therefore, the period of £'sic(/) versus / SC aicd will 
be 0.5 in agreement with our calculation [Fig. [3J. (The 
shape of the curve is not exactly sinusoidal and varies 
slightly with the kind of SIC used.) 

For this example, which part of the sinusoidal-like 
curve one starts the inner-loop minimization from de- 
pends on the initial orbital wavefunctions (and an ar- 
bitrary rotation of them). If we start from the LDA 
eigenstates, the SIC energy is at its maximum (roughly 
speaking, the LDA eigenstates are the most delocalized 
and the SIC energy is highest) and the inner-loop mini- 
mization starts from the top of the sinusoidal-like curve, 
and hence (i) the driving force for the orthogonal trans- 
formation is extremely weak (zero at the maximum) and 
(ii) 2?sic(0 versus / SC aied is concave. For these reasons, 
if we do not properly scale /, or if we do not constrain 
/ during the inner-loop minimization process, the mini- 
mization process based on the assumption that the en- 
ergy profile is convex parabolic may become unstable or 
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FIG. 1: (a) Unitary variant part of the total energy within PZ SIC, -Ef IG [Eq. @], for C20 versus step size / [see Eq. (|23f) ] to 
which the amount of rotation of the occupied electronic states is proportional, (b) Maximum eigenvalue A^ax [Eq. (|27|) ] of the 
gradient matrix dE^^/dAij [Eq. (|10[) ] for PZ SIC as a function of the inner- loop iteration steps. The dashed line is a guide to 
the eye. (c) and (d) Similar quantities as in (a) and (b), respectively, for NK SIC. 



extremely slow. This discussion is also relevant to other 
systems, as we have seen in the case of C20 fullerene. 

Figure S{a) compares the performance of the inner- 
loop minimization for the case PZ SIC. In one case 
(dashed or blue curve), we take the optimal step size 
/optimal obtained from fitting i?gfc(/) versus / by a 
parabola from three calculated quantities: Eg^(l = 0), 
Egjc(l = /trial), and dEg^(l)/dl\i = o. In the other case 
(solid or red curve), if the calculated /optimal is larger than 
jl c (with 7 = 0.1) [Eq. ([26])], we set /optimal = lk- Ap- 
parently, by using this constraint based on / c , or, A max , 
the inner-loop minimization process becomes more sta- 
ble and faster. (In both cases, the trial step of the first 
iteration was set to /trial = 7/c) The difference between 
using and not using this A max constraint is dramatic for 
NK SIC [Fig. Hb)]. This again is due to (i) the small 
gradient of the SIC energy with respect to the variation 
of the orthogonal transformation, and (ii) non-concave- 
parabolic dependence of i?sic(0 on /. 

Until now, our focus was on the inner-loop minimiza- 
tion. Now we look at the entire minimization procedure 
including the outer loop. In order to find an optimal min- 
imization strategy, we have performed our calculations by 
restricting the number of inner-loop minimization itera- 
tions per each outer-loop iteration to be less than or equal 



to n max . (However, not every outer-loop iteration will re- 
quire n max inner-loop iterations because the SIC energy 
may be converged earlier during inner-loop minimization. 
We exit the inner loop if the energy difference between 
consecutive iterations is lower than the energy conver- 
gence threshold of 10 ~ 5 Ry.) The case without inner- 
loop minimization is denoted by n max = 0. Figure Eta) 
shows the convergence of PZ SIC energy for various dif- 
ferent choices of n max . In all cases where the inner- loop 
minimization routine is used (i.e. , n max > 0), the total 
number of outer-loop iterations necessary to achieve the 
same level of convergence is much smaller than that when 
no inner-loop minimization is used. This, however, does 
not necessarily mean that the total computation time is 
reduced. In Fig. [Sfb) , we show the CPU time dependence 
of the SIC energy (the results include both the inner-loop 
and outer- loop minimization iterations). Surprisingly, in 
all cases other than n max = 1, inner- loop minimization 
actually slows down the computation for PZ SIC. When 
we set n max = 1, i.e., if the number of inner-loop itera- 
tions per each outer-loop iteration is restricted to 1, we 
find about twice improvement over when no inner-loop 
minimization is performed in terms of the CPU time. 

The case of NK SIC is very different. Figures (He) 
and[5]Jd) shows that inner- loop minimization reduces not 
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FIG. 2: (a) Unitary variant part of the total energy within 
PZ SIC, Elf c [Eq. ©], versus I A max /7r [Eq. flU}] for C 20 at a 
few inner-loop iteration steps, (b) Similar quantity as in (a) 
for NK SIC. 



only the required number of outer- loop iterations but also 
the CPU time significantly. Especially, the CPU time 
is reduced by ~ 20 times when we perform inner-loop 
minimization, and is rather insensitive to n max . 

These results on PZ SIC and NK SIC support that 
the presented method works regardless of the absolute 
magnitude of the SIC energy gradient with respect to 
the orthogonal transformation [Eq. (|T0|) ]. The method 
can be applied to density functionals with other kinds of 
SIC. For example, SIC with screening, for which the total 
energy is given by 



E< 



total 



E\ 



LDA 



a E\ 



SIC 



(«<1), 



(33) 



will have the SIC energy gradient lower in magnitude 
than the unscreened version of SIC (a = 1), and our 
method will be more useful. 

It has to be noted that the relative CPU time among 
different calculations shown in Fig.[5]at different stages of 
the minimization is affected only by the ratio of the CPU 
time for one inner-loop iteration to that for one outer- 
loop iteration. Therefore, the relative CPU time is rather 
insensitive to the complexity of the system studied, and 
in that sense is meaningful. (The absolute CPU time is 
also affected much by the complexity of the system, the 
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FIG. 3: (a) Unitary variant part of the total energy within 
PZ SIC, Ei? c [Eq. ©], versus I A max /7r [Eq. |@S}] for a carbon 
atom at a few inner-loop iteration steps. 



performance and number of processors, etc.) In our case, 
one inner-loop iteration for PZ SIC takes 3.6 times as long 
as one outer-loop iteration and one inner-loop iteration 
for NK SIC takes 2.0 times as long as one outer-loop 
iteration. 

Finally, we discuss how useful it is to use MLWFs 27, 
[2^| as an initial guess for the wavefunctions [25| . The fol- 
lowing description is relevant for both PZ SIC [Figs.[6]Ja) 
and Sib)] and NK SIC [Figs. E^c) andl^d)] and whether 
or not the inner-loop minimization is employed. Fig- 
ure [6] shows that when MLWFs are used, the initial to- 
tal energy is lower than when LDA wavefunctions with 
arbitrary phases is used. On the other hand, the slope 
of log[(current total energy) — (converged total energy)] 
versus either the number of outer-loop iterations or the 
relative CPU time is not very different in the two cases. 
Therefore, it is advantageous to use MLWFs as an ini- 
tial guess for the wavefunctions; however, the lower the 
energy convergence threshold the smaller the relative ad- 
vantage. 



IV. CONCLUSIONS 

In summary, we have developed a variational, stable 
and efficient approach for the total-energy minimization 
of unitary variant functionals, as they appear in self- 
interaction corrected formulations, with a focus on prop- 
erly minimizing the energy by unitary transformations 
of the occupied manifold. In particular, we have shown 
that the energy changes along the gradient direction can 
be very different from being convex parabolic, and sug- 
gested the use of the maximum frequency component of 
the gradient matrix in determining optimal rotations for 
the inner-loop minimization. When maximally localized 
Wannier functions are used as an initial guess for the 
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FIG. 4: (a) Unitary variant part of the total energy within PZ SIC, -Esic [Eq. (©], versus the inner-loop iteration step for C20 
with and without using the A max constraint (see text) during the inner-loop minimization process, (b) Similar quantity as in 
(a) for NK SIC. 




FIG. 5: (a) Logarithm of the difference between the total energy per carbon atom at each outer-loop iteration step and that 
at convergence for PZ SIC, log A£7£,f al , at a few different values of n max . Here, n max is the maximum number of inner-loop 
iteration steps performed in one outer-loop iteration step, n max = being the case without inner-loop minimization, (b) 
Log A_E^,t al versus the CPU time, (c) and (d) Similar quantities as in (a) and (b), respectively, for NK SIC. 



wavefunctions, the initial energy decreases significantly 
from that corresponding to wavefunctions with arbitrary 
phases; however, the logarithmic energy convergence rate 
remains similar in the two cases. We expect that the 
results will be useful for investigating the physical prop- 
erties of complex materials and big molecules with self- 
interaction corrected density functional theory. 

We thank fruitful discussions with Peter Kliipfel and 
Simon Kliipfel. CHP acknowledges financial support 
from Intel Corporation. 
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